3/9/2021

Introduction

About Me

Introduction

Test

Introduction

Urban Heat Islands

Local Climate Zones

Local Climate Zone classes. Originally from Stewart and Oke (2012) and remade by Bechtel et al. (2017). Copyright CC-BY 4.0

Local Climate Zone classes. Originally from Stewart and Oke (2012) and remade by Bechtel et al. (2017). Copyright CC-BY 4.0

Objective

Yoo

Methods - Data - LCZ

The LCZ reference data

Methods - Data - Landsat

The Landsat 8 data

All 9 available bands of all 4 Landsat scenes amounted to 36 input variables. Each pixel is an observation,

Methods - data - Step 1 - train vs test (CHALLENGE)

Random Forests - decision tree

Random forest - impurity

Splits are typically evaluated by Gini impurity or entropy:

\[ \text{Gini Impurity} =\ I_G(t)\ = 1 - \sum_{i=1}^{C}p(i|t)^2 \] \[ \text{Entropy} =\ I_H(t)\ = -\sum_{i=1}^{C}p(i|t)\log_2p(i|t) \]

Where \(i\) is a class in the predictor variable, ranging from 1 to \(C\). \(C\) is the total number of classes represented for a particular node, \(t\). \(p(i|t)\) is the proportion of samples that belong to each \(i\), for a particular node \(t\).

Random Forest - How random forests differ from decision trees (+ prediction)

Tuning parameters and OOB error (maybe two slides)

Accuracy Assessment

In line with the methods used in our reference paper and the remote sensing field, accuracy metrics will include the following:

\[ \text{Overall Accuracy}= OA= \frac{\text{number of correctly classified reference sites}}{\text{total number of reference sites}} \]

\(OA_{urb}\) and \(OA_{nat}\) will be used, which are the same as overall \(OA\) but only includes the urban and natural classes, respectively.

\[ UA(z)\ = \frac{\text{number of correctly identified pixels in class z}}{\text{total number of pixels identified as class z}} \] \[ PA(z) = \frac{\text{number of correctly identified pixels in class z}}{\text{number of pixels truly in class z}} \]

\(UA\) is a measure of user’s accuracy, which is also called precision or positive predictive value. \(PA\) is the measure of producer’s accuracy, also known as recall or sensitivity. The harmonic mean of \(UA\) and \(PA\) gives the \(F_1\) score, which is a measure of the model’s accuracy. An \(F_1\) Score closer to 1 indicates a model that has both low false positives and low false negatives.

\[ F_1\text{ Score} = 2*\frac{UA*PA}{UA+PA} \]

Results - Varying the Parameter for Number of Trees - 5 to 500 - OA

The parameter for the number of trees was initially varied between 5 and 500 at intervals of 5. The resulting overall accuracy metrics indicate a leveling off around 125 trees (Figure 2). There’s also a clear distinction between accuracy in urban vs. natural classes, with natural classes having a much higher overall accuracy.

The increase in OA metrics levels off around 125 trees. Urban classes (1-10) have much lower accuracy than natural classes (11-17). These metrics were calculated based on the out-of-bag dataset.

The increase in OA metrics levels off around 125 trees. Urban classes (1-10) have much lower accuracy than natural classes (11-17). These metrics were calculated based on the out-of-bag dataset.

Results - Varying the Parameter for Number of Trees - 5 to 500 - F1

The variation between LCZ classes in F-1 score can be seen. As the number of trees in the random forest increases, F-1 score also increases, until around 100 trees. These metrics were calculated based on the out-of-bag dataset.

The variation between LCZ classes in F-1 score can be seen. As the number of trees in the random forest increases, F-1 score also increases, until around 100 trees. These metrics were calculated based on the out-of-bag dataset.

Results - include larger plots for 500-2500 or just mention? Probably just mention

Predicting on the Test Dataset - Validation Metrics Plot

OA and F-1 metrics dropped dramatically upon applying the random forest to the test data (Figure 4).

Accuracy among random forest predictions for the test dataset varied widely, but was lower than expected for F-1 scores, which do not seem to agree with the OA metrics. Classes 2, 5, 8, and 14 have particularly low F-1 Scores

Accuracy among random forest predictions for the test dataset varied widely, but was lower than expected for F-1 scores, which do not seem to agree with the OA metrics. Classes 2, 5, 8, and 14 have particularly low F-1 Scores

Predicting on the Test Dataset - Importance Measures

There is not a clear pattern in Mean Decrease for Gini Impurity between the different bands and scenes, though there is some indication that bands in scene 4 were particularly effective as predictors.

There is not a clear pattern in Mean Decrease for Gini Impurity between the different bands and scenes, though there is some indication that bands in scene 4 were particularly effective as predictors.

A Full Prediction - 1 just landsat

A Full Prediciton -2 just lcz

Imagery of the area of interest. Each has a basemap of satellite reference imagery. Top Left: Only satellite reference. Top Right: One Landsat 8 Scene. Bottom: A fully predicted LCZ map.Imagery of the area of interest. Each has a basemap of satellite reference imagery. Top Left: Only satellite reference. Top Right: One Landsat 8 Scene. Bottom: A fully predicted LCZ map.Imagery of the area of interest. Each has a basemap of satellite reference imagery. Top Left: Only satellite reference. Top Right: One Landsat 8 Scene. Bottom: A fully predicted LCZ map.

Imagery of the area of interest. Each has a basemap of satellite reference imagery. Top Left: Only satellite reference. Top Right: One Landsat 8 Scene. Bottom: A fully predicted LCZ map.

Discussion - large decrease b/wn oob and test data accuracy

Discussion - aggregate like OA can mask low f1 by class

Discussion interpretation? maybe these could all be one slide.

Conclusion - limitations, future work, etc

Questions + me + gitHub again